Fast and Memory-Efficient Discovery of the Top-k Relevant Subgroups in a Reduced Candidate Space

نویسندگان

  • Henrik Grosskreutz
  • Daniel Paurat
چکیده

We consider a modified version of the top-k subgroup discovery task, where subgroups dominated by other subgroups are discarded. The advantage of this modified task, known as relevant subgroup discovery, is that it avoids redundancy in the outcome. Although it has been applied in many applications, so far no efficient exact algorithm for this task has been proposed. Most existing solutions do not guarantee the exact solution (as a result of the use of non-admissible heuristics), while the only exact solution relies on the explicit storage of the whole search space, which results in prohibitively large memory requirements. In this paper, we present a new top-k relevant subgroup discovery algorithm which overcomes these shortcomings. Our solution is based on the fact that if an iterative deepening approach is applied, the relevance check – which is the root of the problems of all other approaches – can be realized based solely on the best k subgroups visited so far. The approach also allows for the integration of admissible pruning techniques like optimistic estimate pruning. The result is a fast, memory-efficient algorithm which clearly outperforms existing top-k relevant subgroup discovery approaches. Moreover, we analytically and empirically show that it is competitive with simpler approaches which do not consider the relevance criterion.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast Discovery of Relevant Subgroups using a Reduced Search Space

We consider a modified version of the local pattern discovery task of subgroup discovery, where subgroups dominated by other subgroups are discarded. The advantage of this modified task, known as relevant subgroup discovery, is that it avoids redundancy in the outcome. Although it was considered in many applications, so far no efficient and exact algorithm for this task has been proposed. One p...

متن کامل

Fiber bundles and Lie algebras of top spaces

In this paper, by using of Frobenius theorem a relation between Lie subalgebras of the Lie algebra of a top space T and Lie subgroups of T(as a Lie group) is determined. As a result we can consider these spaces by their Lie algebras. We show that a top space with the finite number of identity elements is a C^{∞} principal fiber bundle, by this method we can characterize top spaces.

متن کامل

روشی کارا برای کاوش مجموعه اقلام پرتکرار در تحلیل داده‌های سبد خرید

Discovery of hidden and valuable knowledge from large data warehouses is an important research area and has attracted the attention of many researchers in recent years. Most of Association Rule Mining (ARM) algorithms start by searching for frequent itemsets by scanning the whole database repeatedly and enumerating the occurrences of each candidate itemset. In data mining problems, the size of ...

متن کامل

A New Fast and Efficient HMM-Based Face Recognition System Using a 7-State HMM Along With SVD Coefficients

In this paper, a new Hidden Markov Model (HMM)-based face recognition system is proposed. As a novel point despite of five-state HMM used in pervious researches, we used 7-state HMM to cover more details. Indeed we add two new face regions, eyebrows and chin, to the model. As another novel point, we used a small number of quantized Singular Values Decomposition (SVD) coefficients as feature...

متن کامل

FDTD Analysis of Top-Hat Monopole Antennas Loaded with Radially Layered Dielectric

Top-hat monopole antennas loaded with radially layered dielectric are analyzed using the finite-difference time-domain (FDTD) method. Unlike the mode-matching method (MMM) (which was previously used for analyzing these antennas) the FDTD method enables us to study such structures accurately and easily. Using this method, results can be obtained in a wide frequency band by performing only one ti...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011